Search CORE

15 research outputs found

Label Shift Estimators for Non-Ignorable Missing Data

Author: Futoma Joseph
Miller Andrew C.
Publication venue
Publication date: 27/10/2023
Field of study

We consider the problem of estimating the mean of a random variable Y subject to non-ignorable missingness, i.e., where the missingness mechanism depends on Y . We connect the auxiliary proxy variable framework for non-ignorable missingness (West and Little, 2013) to the label shift setting (Saerens et al., 2002). Exploiting this connection, we construct an estimator for non-ignorable missing data that uses high-dimensional covariates (or proxies) without the need for a generative model. In synthetic and semi-synthetic experiments, we study the behavior of the proposed estimator, comparing it to commonly used ignorable estimators in both well-specified and misspecified settings. Additionally, we develop a score to assess how consistent the data are with the label shift assumption. We use our approach to estimate disease prevalence using a large health survey, comparing ignorable and non-ignorable approaches. We show that failing to account for non-ignorable missingness can have profound consequences on conclusions drawn from non-representative samples.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

A unifying representation for a class of dependent random measures

Author: Foti Nicholas J.
Futoma Joseph D.
Rockmore Daniel N.
Williamson Sinead
Publication venue
Publication date: 20/11/2012
Field of study

We present a general construction for dependent random measures based on thinning Poisson processes on an augmented space. The framework is not restricted to dependent versions of a specific nonparametric model, but can be applied to all models that can be represented using completely random measures. Several existing dependent random measures can be seen as specific cases of this framework. Interesting properties of the resulting measures are derived and the efficacy of the framework is demonstrated by constructing a covariate-dependent latent feature model and topic model that obtain superior predictive performance

arXiv.org e-Print Archive

CiteSeerX